Succinct Dictionary Matching with No Slowdown

نویسنده

  • Djamal Belazzougui
چکیده

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T | + occ) using a data structure that occupies O(m logm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ+O(1))+ O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T |+ occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log σ) while answering queries in O(|T | log logn + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < m for any constant 0 < ε < 1. The query time remains unchanged.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Succinct 2D Dictionary Matching with No Slowdown

The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. This paper presents the first 2-dimensional dictionary matching algorithm that operates in small space and linear time. Given d patterns, D = {P1, . . . , Pd}, each of...

متن کامل

Succinct Online Dictionary Matching with Improved Worst-Case Guarantees

In the online dictionary matching problem the goal is to preprocess a set of patterns D = {P1, . . . , Pd} over alphabet Σ, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching pro...

متن کامل

Design of Practical Succinct Data Structures for Large Data Collections

We describe a set of basic succinct data structures which have been implemented as part of the Succinct library, and applications on top of the library: an index to speed-up the access to collections of semi-structured data, a compressed string dictionary, and a compressed dictionary for scored strings which supports top-k prefix matching.

متن کامل

Dynamic 2D Dictionary Matching in Small Space

The dictionary matching problem preprocesses a set of patterns and finds all occurrences of each of the patterns in a text when it is provided. We focus on the dynamic setting, in which patterns can be inserted to and removed from the dictionary, without reprocessing the entire dictionary. This article presents the first algorithm that performs dynamic dictionary matching on two-dimensional dat...

متن کامل

Applications of Succinct Dynamic Compact Tries to Some String Problems

The dynamic compact trie is a fundamental data structure for a wide range of string processing problems. In this paper, we report our recent work on succinct dynamic compact tries that stores a set of strings of total length n in O(n log σ) space supporting pattern matching and insert/delete operations in O((|P |/α)f(n)) time, where P is a pattern string, α = Θ(logσ n), and f(n) = O((log logn) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010